Skip to content

feat: PG store config override + cross-replica auth sync#2495

Open
xandr0s wants to merge 2 commits intorouter-for-me:mainfrom
xandr0s:feat/pgstore-config-override-and-auth-sync
Open

feat: PG store config override + cross-replica auth sync#2495
xandr0s wants to merge 2 commits intorouter-for-me:mainfrom
xandr0s:feat/pgstore-config-override-and-auth-sync

Conversation

@xandr0s
Copy link
Copy Markdown

@xandr0s xandr0s commented Apr 2, 2026

Summary

Two features for Kubernetes multi-replica deployments with PostgreSQL store:

1. Config override via PGSTORE_CONFIG_OVERRIDE_PATH

When this env var is set, the file at that path is read on bootstrap and written into PG store, overriding any previously stored config. This lets k8s ConfigMaps take precedence over PG-stored config on every pod restart.

Problem: PG store seeds config from file on first run, then always reads from DB — ignoring ConfigMap updates on subsequent deploys.

2. Auth sync polling via StartAuthSync()

Periodically polls auth_store table and incrementally updates local auth files. When one replica adds/refreshes OAuth credentials, other replicas pick them up within 30s.

Implementation details:

  • Incremental sync: compares content before writing, only writes changed files
  • Mutex-protected: prevents races with concurrent file access
  • Cleans up local files removed from PG
  • Works with existing fsnotify file watcher for seamless client reload

Changes

  • internal/store/postgresstore.go: Add StartAuthSync() + incrementalAuthSync(), config override in syncConfigFromDatabase()
  • cmd/server/main.go: Call StartAuthSync() after PG store bootstrap

Both features are opt-in and fully backward compatible.

Test plan

  • Deploy with PGSTORE_CONFIG_OVERRIDE_PATH — verify ConfigMap overrides PG config on restart
  • Deploy 2 replicas — add OAuth account on replica 1, verify replica 2 picks it up within 30s
  • Deploy without env var — verify existing behavior unchanged
  • Verify no excessive fsnotify events when auth files unchanged

Two features for Kubernetes multi-replica deployments with PG store:

1. Config override via PGSTORE_CONFIG_OVERRIDE_PATH env var:
   When set, the file at that path is read on bootstrap and written
   into PG store, overriding any previously stored config. This lets
   k8s ConfigMaps take precedence over PG-stored config on every
   pod restart.

2. Auth sync polling via StartAuthSync():
   Periodically polls auth_store table and incrementally updates local
   auth files. Only writes files whose content has actually changed,
   avoiding unnecessary fsnotify events. Uses mutex to prevent races
   with concurrent file access. Cleans up local files removed from PG.

Both features are opt-in and fully backward compatible.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for several new login providers, including Kiro, GitLab, Cursor, and GitHub Copilot, along with an incognito mode for browser-based authentication. It also implements a background synchronization mechanism in the Postgres store to support multi-replica credential sharing and adds an environment variable for overriding database configuration with local files. Feedback highlights a redundant call to the synchronization loop, a regression in header processing, and logic errors in the file cleanup process that could lead to orphaned files or incorrect deletions.

Comment on lines +316 to +317
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The StartAuthSync method is called twice consecutively with the same parameters. This starts two redundant background synchronization loops, which is inefficient and could lead to unnecessary resource contention on the store's internal mutex.

Suggested change
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)

NextRefreshAfter: time.Time{},
}
cliproxyauth.ApplyCustomHeadersFromMetadata(auth)
auths = append(auths, auth)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to cliproxyauth.ApplyCustomHeadersFromMetadata(auth) was removed. This appears to be a regression, as it prevents custom headers defined in the metadata from being applied to the authentication object. This will likely break functionality for providers that rely on these headers.

		cliproxyauth.ApplyCustomHeadersFromMetadata(auth)
		auths = append(auths, auth)

Comment on lines +213 to +220
entries, _ := os.ReadDir(s.authDir)
for _, e := range entries {
if e.IsDir() || seen[e.Name()] {
continue
}
os.Remove(filepath.Join(s.authDir, e.Name()))
changed = true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup logic for local auth files is flawed:

  1. Not recursive: It skips directories (e.IsDir()), so files in subdirectories (which are created on line 201) will never be deleted if their database records are removed.
  2. Incorrect key matching: It checks seen[e.Name()], but seen is keyed by the full id (e.g., provider/user.json), while e.Name() is only the base filename (e.g., provider). This will cause incorrect deletions or failures to delete.
  3. Error handling: Errors from os.ReadDir and os.Remove are ignored.

Consider using filepath.WalkDir to correctly identify and remove orphaned files across the entire directory structure.

Comment on lines +153 to +169
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
changed, err := s.incrementalAuthSync(ctx)
if err != nil {
log.Printf("postgres store: auth sync poll: %v", err)
continue
}
if changed && onChange != nil {
onChange()
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.NewTicker in a background goroutine can cause synchronization tasks to pile up if incrementalAuthSync takes longer than the interval. Although the store uses a mutex, multiple goroutines could end up blocked waiting for the lock. A more robust approach is to use a loop with time.Sleep or time.After to ensure a fixed delay between the completion of one sync and the start of the next.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e4c1e50ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

NextRefreshAfter: time.Time{},
}
cliproxyauth.ApplyCustomHeadersFromMetadata(auth)
auths = append(auths, auth)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore custom header hydration for Postgres auth records

Re-add cliproxyauth.ApplyCustomHeadersFromMetadata(auth) before appending each auth in List. This commit removed that call, so PG-backed auth entries no longer populate Attributes["header:..."] from metadata, unlike the git/object stores; any provider that depends on custom headers will silently lose them after load/restart and requests can fail or be routed incorrectly.

Useful? React with 👍 / 👎.

cfg.AuthDir = pgStoreInst.AuthDir()
log.Infof("postgres-backed token store enabled, workspace path: %s", pgStoreInst.WorkDir())
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove duplicate auth sync poller startup

The second StartAuthSync call starts an extra goroutine with the same 30s interval, so each instance polls and syncs auth data twice. In production PG deployments this doubles steady-state database load and file-sync churn for no functional gain.

Useful? React with 👍 / 👎.

…s, fix cleanup

- Remove duplicate StartAuthSync call in main.go
- Restore cliproxyauth.ApplyCustomHeadersFromMetadata(auth) in List()
- Fix file cleanup: use filepath.WalkDir for recursive traversal,
  match by relative path (not just filename)
- Replace time.Ticker with time.After to prevent sync pileup
@xandr0s xandr0s force-pushed the feat/pgstore-config-override-and-auth-sync branch from 672e058 to dc88a48 Compare April 2, 2026 16:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc88a4803c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if err == nil {
cfg.AuthDir = pgStoreInst.AuthDir()
log.Infof("postgres-backed token store enabled, workspace path: %s", pgStoreInst.WorkDir())
pgStoreInst.StartAuthSync(context.Background(), 30*time.Second, nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a bounded context for auth sync polling

StartAuthSync is started with context.Background(), but incrementalAuthSync takes the store mutex before QueryContext; if the DB call stalls (for example during a network partition), the poll can block indefinitely while holding s.mu, which also blocks Save, Delete, and PersistAuthFiles on that replica. Please pass a cancellable context with a per-poll timeout instead of an unbounded background context.

Useful? React with 👍 / 👎.

Comment on lines +202 to +203
if err := os.WriteFile(path, []byte(payload), 0o600); err != nil {
return false, fmt.Errorf("write auth file: %w", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent poller from clobbering pending local auth edits

This overwrite path applies whenever file bytes differ from the DB payload, so in PG mode a direct auth-file edit can be lost if the poll runs before watcher-driven PersistAuthFiles finishes (that persistence is asynchronous): the poll writes stale DB content back to disk, and the stale version is then re-persisted. That makes file-based auth updates non-deterministic under concurrent polling.

Useful? React with 👍 / 👎.

@xandr0s
Copy link
Copy Markdown
Author

xandr0s commented Apr 2, 2026

Thanks for the thorough review! Regarding the P2 concern about poll clobbering local edits:

In our deployment, auth credentials are always added via management UI or CLI login — both write to PG store first, then mirror to disk. The poll only syncs PG → local, never the reverse direction. So the race between poll and local file edits doesn't apply to our use case.

That said, adding an updated_at comparison (skip overwrite if DB record is older than local file mtime) would make this more robust for mixed workflows. Happy to add that if you think it's worth it for the general case.

Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PG Store Config Override + Cross-Replica Auth Sync

Summary

Well-architected feature for multi-replica deployments. The auth sync mechanism enables horizontal scaling without sticky sessions, and the config override provides necessary flexibility for k8s environments.

Strengths

  • Smart incremental sync — only writes changed files, avoiding fsnotify storms
  • Atomic file operations — proper temp file + rename pattern in incrementalAuthSync
  • Config override is well-designed — PGSTORE_CONFIG_OVERRIDE_PATH provides clean k8s integration
  • Orphan cleanup — removes local auth files not present in DB, preventing stale credential accumulation

Issues

1. Potential race condition in auth sync startup
In main.go, StartAuthSync is called immediately after pgStoreInst creation, but there's a small window where the bootstrap may not have completed. Consider starting sync after Bootstrap() returns.

2. Error handling in orphan cleanup is silent
If removal fails (permissions, filesystem issues), it's silently ignored. Consider logging at WARN level.

3. No backoff for sync failures
The sync loop logs errors but retries immediately on the next interval. If PG is temporarily unavailable, this could spam logs. Consider adding a simple backoff or at least a counter.

4. Missing test coverage

  • No tests for incrementalAuthSync logic
  • No tests for config override fallback path (when override file read fails)
  • No tests for orphan cleanup behavior

Security

  • Proper file permissions (0o600 for files, 0o700 for dirs)
  • SQL queries use parameterized queries implicitly via QueryContext
  • Consider adding authentication/authorization on the auth_store table access if multiple untrusted pods share the PG instance

Performance

The 30-second poll interval is reasonable for auth propagation, but consider making it configurable for latency-sensitive environments

Verdict

Good foundational work for multi-replica support. The suggestions are improvements, not blockers. Recommend adding tests before merge if time permits.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:
Adds two Postgres-store features aimed at multi-replica/Kubernetes deployments: (1) PGSTORE_CONFIG_OVERRIDE_PATH to force an on-disk config to win on bootstrap, and (2) auth sync polling to mirror auth_store changes into local auth files.

Key findings:

  • Config override is applied early during bootstrap and keeps the DB + local spool consistent.
  • Auth sync is incremental (content-compare before write) and serialized to reduce race risk and unnecessary fsnotify churn.
  • The poll loop uses time.After(interval) to avoid ticker pileups if a sync iteration runs long.

Non-blocking suggestions:

  • In the cleanup pass, consider normalizing rel with filepath.ToSlash(rel) before checking against seen to avoid OS-separator mismatches.
  • Consider starting StartAuthSync() with a cancellable lifecycle context instead of context.Background() for graceful shutdown (optional).
  • Consider guarding/documenting interval > 0 to avoid a tight loop if misconfigured (optional).

Test plan:

  • go test ./...

This is an automated Codex review result and still requires manual verification by a human reviewer.

@luispater luispater added the codex label Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants